Some of the topics seen at RStudio conference 2019

Marta Karas
February 12, 2019

Outline

  1. RStudio conference vs UseR! conference

  2. gganimate (with ggplot intro) for animated plots

  3. gt to turn a data table into “information-rich, publication-quality” table outputs

  4. pagedown to get paged HTML documents








Note: A number of screenshots/images/citations are using across this presentation. I provide reference on the last slides.

plot of chunk unnamed-chunk-1plot of chunk unnamed-chunk-1

RStudio 2019 conference plot of chunk unnamed-chunk-2

  • January 15-18, 2019 | Austin, TX, USA

  • Talks focused on RStudio, Inc. products: RStudio, Shiny, R packages, RStudio Server, RStudio Connect

  • Call for poster submission only

  • Diversity scholarships were available (😏not successful applicants got conf fee discount)

useR! 2019 conference plot of chunk unnamed-chunk-3

  • July 9-12, 2019 | Toulouse, France

  • Topics seem more academia work-welcoming (Last year one: “Using mommix for fast, large-scale genome-studies in the presence of gene-environment and gene-gene interaction”)

  • Call for submissions: tutorial, oral presentation, lighting talks, posters (deadline: Jan 18 for tutorials, Mar 1, 2019 for the other submissions)

  • Diversity scholarships are available (Deadline: Mar 1, 2019)

Successful poster abstract submission for RStudio conf?

I wish I had done this:

plot of chunk unnamed-chunk-4

plot of chunk unnamed-chunk-5

`gganimate` package

  • Extension to ggplot2, provides “implementation of the grammar of animated graphics”

  • Returns a gif_image object which is a simple wrapper around a path to a gif file

    • can be easily embedded in R Markdown documents
    • may use other renderer (ie. to create video files)
  • Available on CRAN: install.packages("gganimate").

    Development version can be installed from GitHub ("thomasp85/gganimate")

plot of chunk unnamed-chunk-6plot of chunk unnamed-chunk-6

Quick intro to `ggplot2` package

  • ggplot2 - R package to create relatively complicated plots in a relatively simple way

  • Uses “grammar of graphics”, that is, “tells us that a statistical graphic is a mapping from data to aesthetic attributes (colour, shape, size) of geometric objects (points, lines, bars)”

  • To make graphics using ggplot2, the data needs to be in a tidy format

Tidy vs messy data

Tidy data:

  1. Each variable forms a column.

  2. Each observation forms a row.

  3. Each type of observational unit forms a table.

Messy data:

  • Column headers are values, not variable names.

  • Multiple variables are stored in one column.

  • Variables are stored in both rows and columns.

  • Multiple types of observational units are stored in the same table.

  • A single observational unit is stored in multiple tables.

Tidy data: Examples

Each variable forms a column. Each observation forms a row.

plot of chunk unnamed-chunk-7

Messy data: Example

Column headers are values, not variable names

plot of chunk unnamed-chunk-8

Read more about tidy data and see other examples: Tidy Data by Hadley Wickham

ggplot2: make a simple plot

Draw line, mark points.

######################################################################
## Make some foo data frame
df <- data.frame(time = 1:10, value = rnorm(10))

## Plot
library(ggplot2)
ggplot(df, aes(x = time, y = value)) + 
  geom_point() + 
  geom_line()

plot of chunk unnamed-chunk-10

ggplot2: make simple plot + put some decor

Draw line, mark points, modify line and points look.

#################################################################################
ggplot(df, aes(x = time, y = value)) + 
  geom_point(size = 5, 
             color = "red") + 
  geom_line(linetype = 2, 
            color = "brown", 
            size = 0.8) + 
  theme_bw(base_size = 20) + 
  labs(x = "My x axis label", 
       y = "My y axis label", 
       title = "My plot title")

plot of chunk unnamed-chunk-11

ggplot2: group and color by variable

#################################################################################
## Make some foo data frame
##
## - 20 different items in 2 different categories 
## - value time series for each item  
## - 100 time points of data collection for each item
## 
set.seed(1)
time  <- as.vector(replicate(20, 1:100))
item  <- as.vector(sapply(1:20, function(i) rep(i, 100)))
value <- as.vector(replicate(20, cumsum(rnorm(100))))
categ <- as.vector(sapply(1:20, function(i) rep(sample(c("A", "B"), 1), 100)))
df    <- data.frame(time, item, value, categ)

str(df)
'data.frame':   2000 obs. of  4 variables:
 $ time : int  1 2 3 4 5 6 7 8 9 10 ...
 $ item : int  1 1 1 1 1 1 1 1 1 1 ...
 $ value: num  -0.626 -0.443 -1.278 0.317 0.646 ...
 $ categ: Factor w/ 2 levels "A","B": 1 1 1 1 1 1 1 1 1 1 ...

ggplot2: group and color by variable [cont.]

Grouping lines by item variable.

#############################################
ggplot(df, 
       aes(x = time,
           y = value, 
           group = item)) + 
  geom_line() + 
  theme_grey(base_size = 20)

plot of chunk unnamed-chunk-13

Grouping lines by item, colouring lines by categ variable.

#############################################
ggplot(df, 
       aes(x = time, 
           y = value, 
           group = item, 
           color = categ)) + 
  geom_line() + 
  labs(color = "Category: ") + 
  theme_grey(base_size = 20) 

plot of chunk unnamed-chunk-14

ggplot2: group and color by variable: boxplot

Grouping boxplots by item variable.

#############################################
ggplot(df, 
       aes(x = item, y = value, 
           group = item)) + 
  geom_boxplot()  +  
  labs(x = "Item ID", 
       y = "Value") + 
  theme_grey(base_size = 20) 

plot of chunk unnamed-chunk-15

Grouping boxplots by item variable, filling with color by categ variable.
Use alpha to make boxplot fill transparent.

#############################################
ggplot(df, 
       aes(x = item, y = value, 
           group = item, 
           fill = categ)) + 
  geom_boxplot(alpha = 0.3)  +  
  labs(x = "Item ID", 
       y = "Value", 
       fill = "Category: ") + 
  theme_grey(base_size = 20) + 
  theme(legend.position = "top")  

plot of chunk unnamed-chunk-16

ggplot2: split plot into panels by variable

####################################################################
## Make some new foo data 
categ <- sample(c("cat_A", "cat_B"), 1000, replace = TRUE)
id    <- as.vector(replicate(1000/4, paste0("ID_", 1:4)))
value <- rnorm(1000)
df    <- data.frame(categ, id, value)

ggplot(df, aes(x = value)) + 
  geom_histogram(fill = "yellow", color = "black") +  
  facet_grid(id ~ .) + 
  labs(x = "Value", y = "Count") + 
  theme_grey(base_size = 20)  

plot of chunk unnamed-chunk-17

####################################################################
ggplot(df, aes(x = value)) + 
  geom_histogram(fill = "blue", color = "black", alpha = 0.1) +  
  facet_grid(id ~ categ) + 
  labs(x = "Value", y = "Count") + 
  theme_bw(base_size = 20)  

plot of chunk unnamed-chunk-18

See also: facet_wrap.

gganimate: Example 1

Without gganimate:

Boxplot of Miles/(US) gallon, stratified by Number of cylinders (x-axis), and by # of gears (horizontal panels split).

###############################################
plt <- 
  ggplot(mtcars, aes(factor(cyl), mpg)) + 
  geom_boxplot() + 
  facet_grid(gear ~ .) + 
  labs(x = 'Number of cylinders', 
       y = 'Miles/(US) gallon') + 
  theme_grey(base_size = 20)  

plot of chunk unnamed-chunk-20

gganimate: Example 1 (cont.)

library(gganimate)

ggplot(mtcars, aes(factor(cyl), mpg)) + 
  geom_boxplot() + 
  # below: gganimate code
  transition_states(
    gear,
    transition_length = 0.5,
    state_length = 0.5
  ) +
  enter_fade() + 
  exit_shrink() +
  ease_aes('sine-in-out') + 
  labs(title = 'Gear: {closest_state}', 
       x = 'Number of cylinders', 
       y = 'Miles/(US) gallon') 
  • transition_states() defines splits data into multiple states
  • enter_fade(), exit_shrink() define a way to handle a lack of data state case
  • ease_aes() defines a manner in which a value change to another (will it progress linearly, or maybe start slowly and then build up momentum?)

plot of chunk unnamed-chunk-22

gganimate: Example 2

library(gapminder)

ggplot(gapminder, 
       aes(gdpPercap, lifeExp, 
           size = pop, colour = country)) +
  geom_point(alpha = 0.7, show.legend = FALSE) +
  scale_colour_manual(values = country_colors) +
  scale_size(range = c(2, 12)) +
  scale_x_log10() +
  facet_wrap(~ continent, ncol = 2) +
  # Here comes the gganimate specific code
  transition_time(year) +
  labs(title = 'Year: {frame_time}', 
       x = 'GDP per capita', 
       y = 'life expectancy') +
  ease_aes('linear')
  • transition_time() is a variant of transition_states() that is intended for data where the states are representing specific point in time; transition length between the states will be set to correspond to the actual time difference between them

plot of chunk unnamed-chunk-24

gt: generate information-rich, publication-quality tables from R

  • gt philosophy: construct a wide variety of tables with a cohesive set of table parts

plot of chunk unnamed-chunk-26

gt: workflow

  • Begin with preprocessed table data (be it a tibble or a data.frame)
  • Compose gt object with the elements you need for the task at hand
  • Output can either be in the form of HTML, LaTeX, or RTF. All work beautifully inside R Markdown documents.

plot of chunk unnamed-chunk-27

gt: use example data

  • sp500: data of daily price indicators for the S&P 500 index from 1950 to 2015

  • HTML output produced via printing data frame

# devtools::install_github("rstudio/gt")
library(tidyverse)
library(gt)
start_date <- "2010-06-07"
end_date   <- "2010-06-14"

out1 <- 
  sp500 %>%
  filter(date >= start_date & date <= end_date) %>%
  select(-adj_close) %>%
  mutate(date = as.character(date))
out1

plot of chunk unnamed-chunk-29

gt: Example

out1 %>%
  gt() %>%
  tab_header(
    title = "S&P 500",
    subtitle = glue::glue("{start_date} to {end_date}")
  ) %>%
  fmt_date(
    columns = vars(date),
    date_style = 3
  ) %>%
  fmt_currency(
    columns = vars(open, high, low, close),
    currency = "USD"
  ) %>%
  fmt_number(
    columns = vars(volume),
    scale_by = 1 / 1E9,
    pattern = "{x}B"
  )
  • tab_header - add a table header with a title (and subtitle)
  • fmt_date - format date values according to certain style
  • fmt_currency - do currency-based formatting with fine contol options
  • fmt_number - do number-based formatting so that the targeted values are rendered with a “higher consideration for tabular presentation”

plot of chunk unnamed-chunk-31

plot of chunk unnamed-chunk-32

plot of chunk unnamed-chunk-33plot of chunk unnamed-chunk-33

  • HTML output produced with: kable()

library(knitr)
library(kableExtra)

out1 %>%
  kable() %>%
  kable_styling(bootstrap_options = c("striped"), 
                font_size = 15) %>%
  column_spec(6, bold = TRUE, 
              background = "yellow") %>%
  row_spec(4:5, color = "white", 
           background = "#D7261E")

plot of chunk unnamed-chunk-35

Yihui Xie talk

Yihui Xie: main author of knitr R package and R Markdown document format

  • Markdown - “(1) a plain text formatting syntax” designed to be as readable as possible

  • R Markdown = Markdown + R code chunks

  • knitr - executes the computer code embedded in Markdown, and converts R Markdown to Markdown

  • Pandoc: renders Markdown to the output format you want (PDF, HTML, Word etc)

Talk topic: pagedown package

plot of chunk unnamed-chunk-36

`pagedown` package

From package website:

Paginate the HTML Output of R Markdown with CSS for Print. You only need a modern web browser (e.g., Google Chrome) to generate PDF. No need to install LaTeX to get beautiful PDFs.

Also:

Description: Use the paged media properties in CSS and the JavaScript library 'paged.js' to split the content of an HTML document into discrete pages. Each page can have its page size, page numbers, margin boxes, and running headers, etc. Applications of this package include books, letters, reports, papers, business cards, resumes, and posters.

plot of chunk unnamed-chunk-37

plot of chunk unnamed-chunk-38

plot of chunk unnamed-chunk-39

plot of chunk unnamed-chunk-40

plot of chunk unnamed-chunk-41

Resources

  • The official repo with abstracts for every session, workshop (together with workshop files free to download for most if not all of them), and e-poster can be accessed here [link] .

References

  • The gganimate materials (description, gifs, R code examples) were sourced from gganimate package website: https://gganimate.com/.

  • The gt materials (description, images, R code examples) were sourced from gt package website: https://github.com/rstudio/gt.

  • Text from “Quick intro to ggplot2 package” slide text was spartially ourced from Jeff Leek materials available here.

  • Text from “Tidy vs messy data” slide, and tables screenshots from slides “Tidy data: Examples”, “Messy data: Example”, were sourced from Tidy Data paper by Hadley Wickham available here.

  • pagedown package logo, text and examples were sourced from package documentation and (mostly) presentation by Yihui Xie and Romain Lesur, available here.